Computer vision - Lab 2

Agenda

Helpers

To perform the tasks, it is necessary to import the libraries used in the script and download the data on which we will be working.

In this script we will be using:

The colab platform requires a special way to display images with opencv. If the notebook is run in collab, execute the following code:

A function that compares 2 images, checking if the values ​​are the same with minor calculation errors and if the element types are uint8.

Color spaces

Digital image storage is the representation of color in a certain field that to some extent reflects how humans perceives light. The most intuitive is spatial domain, in which the image consists of pixels arranged in the form of a 2D matrix. Each pixel has its own intensity value. This value can be represented in many ways, for example:

Different color spaces provide different image processing options. For example, from the HSV space we can determine the brightness directly, while using Grayscale it may be easier to detect the contours of objects in the scene.

Besides the spatial domain, the image can also be processed in the frequency domain. A 2D matrix image can be treated as a 2-dimensional signal, so it is subject to all operations on signals such as Fourier transform. By representing an image in the frequency domain, we have the possibility of easier detection of edges, blurred areas and image filtering.

RGB color space:

rgb.png

CYMK color space:

cmyk.png

Image loading

The most popular libraries in Python for image processing are:

We can see some differences in image processing by these libraries. OpenCV works by default on images in the BGR format, while Pillow in the RGB format. BGR is nothing but the inverted color order for each pixel (Blue, Green, Red).

Conversion between color spaces

We can move freely between color spaces during image processing. Moreover, most libraries have mechanisms of conversions between the most popular spaces implemented by default.

RGB - HSV

Assuming input data R, G, B, where $ R, G, B \in [0,1] $ (if they are in the range $[0,255]$ then they should be divided by $255.0$). The conversion from RGB to HSV space can be represented as:

$C_{max} = max(R,G,B)$
$C_{min} = min(R,G,B)$
$\Delta = C_{max} - C_{min}$


${ H=\left\{ \begin{array}{ll} 0{\hspace{0.5cm}\text{for}\hspace{0.5cm}} \Delta = 0\\ 60 * (\frac{G - B}{\Delta} \mod 6){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = R\\ 60 * (\frac{B - R}{\Delta} + 2){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = G\\ 60 * (\frac{R - G}{\Delta} + 4){\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = B \end{array} \right.}$

${ S=\left\{ \begin{array}{ll} 0{\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} = 0\\ \frac{\Delta}{C_{max}} {\hspace{0.5cm}\text{for}\hspace{0.5cm}} C_{max} \neq 0 \end{array} \right.}$

$V = C_{max}$
Note: S and V should be scaled to values between $ [0, 255] $. The range of Hue values ​​for this algorithm is $[0, 359]$, however opencv uses the range $[0,179]$ so the resulting H value should be divided by 2.

RGB - Grayscale

The conversion from RGB to Grayscale space can be represented as:

$$Gray = 0.2989 * R + 0.5870 * G + 0.1140 * B$$

OpenCV implementation

OpenCV includes a ready function cvtColor, which takes the image to be processed as the first parameter, and a constant specifying the type of conversion as the second parameter (constants marked by variables, eg COLOR_RGB2BGR).

Task 1

Implement the following conversions:

The results will be compared with the results of the functions included in OpenCV.

Note: There may be slight differences in pixel values between the same transforms, which may be due to numerical errors or simply a different type of rounding / truncating the values when transforming a floating point number to an integer.

Note 2: Don't use loops that iterate over each pixel.

Custom color spaces

Color spaces such as RGB and HSV are standard spaces that result from the nature of light. However, this does not limit you from creating your own color spaces. The pseudo-coloring methods are presented below - that is, giving the pixels a color based on an artificially prepared color space.

The Hot space may seem to be a particularly interesting space, as it gives a warmer color (yellow) for pixels of greater intensity (grayscale).

To apply our own color space, we can use the functions in the OpenCV LUT (Lookup Table). In the example below, 3 color spaces are prepared (lut_1, lut_2, lut_3). Each of these tables is a mapping function between each of the 256 values (uint8) and the new value. This function can also be applied to multi-channel images.

Task 2

For a Lenna image in Grayscale space, first transform it to a space containing 8 colors (buckets), and then convert the image to a Hot space.

Display intermediate results.

Point operations

A point operation is a transformation that transforms an image into another image, for which the result of a particular pixel depends only on the corresponding pixel in the input image. Formally, any image operation can be written as follows:

$$F : I_{in} \rightarrow I_{out}$$

where:

with a constraint:

$$I_{out}(x,y) = F(I_{in}(x,y))$$

meaning that an output image pixel is the result of the F function on the corresponding input image pixel.

Examples of point operations are (for the simplicity, we assume that $i \in [0, 1]$):

Helper functions

Point transformations

Below are the implementations of identity, inversion, and gamma correction operations and their visualizations.

Task 3

Implement the following transformations:

Point operations for images

Above, we have defined the basic image transformation functions. These operations can be directly applied to images, resulting in a new transformed image.

Below are the same functions as above applied to the image of Lenna.

Helper functions

Arithmetic operations for images

Image pixel intensity / frequency values are represented as numbers (they are integer or floating point). This implies the ability to perform certain operations on pairs (sets) of images, such as addition, subtraction, averaging, etc.

Below is presented a naive solution to the problem of merging images representing the same object with different sharpness.

We can perform the averaging operation on the loaded images. The expected result is an image with medium sharpness.

If the areas in which the sharpness of the image is high were specified, it would be possible to assemble the sharp image in every place (for this, convolution operations are needed, which will be introduced in the next classes).

Task 4

The task is to practice arithmetic operations on images and simple feature detection based on single-point processing.

Using the Lenna image, find areas in the image for which grayscale values are in the range 120-160.

Then propose operations that will return the inverse of the Lenna image (RGB or BGR for cv2) for selected areas, and copy the pixels from the Lenna image (RGB or BGR for cv2) for the remaining areas.

Geometric transformations

In addition to operations modifying a single pixel, there are also those that transform the geometry of the entire image. The basic geometric transformations include:

The above operations are called affine operations and can be represented as:

$$ y = Tx +b$$

where $x$ is the pixel position vector $(i,j)$ in the input image, $y$ is the pixel position vector $(i',j')$ in the output image, $b$ is the translation vector and is and $T$ is a matrix of transformation

In order for the affine transformation to take only one parameter $T$, it is necessary to extend the number of dimensions to 3, where the pixel vectors on the last dimension have the value 1.

Thus, the general form of an affine transform takes the form: $$ \begin{bmatrix} i'\ j'\ 1

\end{bmatrix}

\begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ 0 & 0 & 1 \end{bmatrix}* \begin{bmatrix} i\\ j\\ 1 \end{bmatrix} $$

Then, the basic operations can be defined as:

The above operations can be composed by matrix multiplication.

Note:
OpenCV contains an affine operation application operation, however, since the last row always has the same form in basic operations ($[0, 0, 1]$), it takes a transformation in the form:

$$ T = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23} \end{bmatrix} $$

The entire matrix is used in more advanced transformations, e.g. in a perspective transformation or homography: $$ T = \begin{bmatrix} a_{11} & a_{12} & a_{13}\\ a_{21} & a_{22} & a_{23}\\ a_{31} & a_{32} & a_{33} \end{bmatrix} $$

The above example shows that the individual results of affine operations can be lossy. You can notice that after the first shift operation, part of the color image is out of the frame and is lost. This results in incorrect later processing (despite correct mathematical syntax).

The solution is to combine affine operations using matrix multiplication. Below is a single transform containing all of the above operations, while not losing information between operations.

2D Transformations

Transformation name included operations Preserves
translation translation straight lines, parallelism, angles, lengths, orientation
rigid (Euclidean) translation, rotation straight lines, parallelism, angles, lengths
similarity translation, rotation, scaling straight lines, parallelism, angles
affine translation, rotation, scaling, affine straight lines, parallelism
projective translation, rotation, scaling, affine, projective straight lines

Histogram

The calculation of the histogram consists in counting the number of pixels of a given value. In other words, the histogram shows how many pixels of a certain intensity there are in the image.

To calculate a histogram for an image, we can use a ready-made function contained in the matplotlib hist() library or numpy histogram.

Some irregularities in the number of pixel intensity occurrences can be seen from the histogram of the image. When the histogram is unbalanced, and therefore certain intensity ranges dominate the image, we can use the histogram equalization method so that the transformed image has a more even number of all intensities.

Thanks to this operation, we can transform very dark images, in which no characteristic points are visible, in such a way that changes in pixel intensity highlight previously invisible changes in the image.

Histogram equalization is implemented in the OpenCV library as equalizeHist which takes an image as input and returns the image after transformation.

To manually perform a histogram equalization, first calculate cumulative distribution function of the pixel intensity. The cumulative distribution function tells us what the probability is that when selecting any pixel in the image, its intensity will be less than a given intensity on the cumulative distribution function.

The next step is to normalize the obtained cumulative distribution (so that the values of the pledge domain are in the range [0, 255]). In this way, we got our own lookup table, introduced earlier in the class.

Having this table, we can use the LUT() operation to get the image after the histogram has been aligned.